Goto

Collaborating Authors

 improved interpretability


Regularizing Black-box Models for Improved Interpretability

Neural Information Processing Systems

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.


Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Neural Information Processing Systems

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. We propose a framework, Neural Interaction Transparency (NIT), that disentangles the shared learning across different interactions to obtain their intrinsic lower-order and interpretable structure. This is done through a novel regularizer that directly penalizes interaction order. We show that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models. NIT is also flexible and efficient; it can learn generalized additive models with maximum $K$-order interactions by training only $O(1)$ models.


Regularizing Black-box Models for Improved Interpretability

Neural Information Processing Systems

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.


Regularizing Black-box Models for Improved Interpretability

Neural Information Processing Systems

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.


Reviews: Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Neural Information Processing Systems

This paper proposes a novel approach to more interpretable learning in neural networks. In particular, it addresses the common criticism that the computations performed by neural networks are often hard to intuitively interpret, which can be a problem in applications, e.g. in the medical or financial fields. The authors suggest adding a novel regulariser to the weights of first layer of a neural network to discover and preserve non-additive interactions in the data features up to a chosen order and preserve these relationships without entangling them together (e.g. These interactions could then be further processed by the separate columns of the neural network. The approach was evaluated on a number of datasets and seems to perform similarly to the baselines on regression and classification tasks, while being more interpretable and less computationally expensive.


Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability

Tsang, Michael, Liu, Hanpeng, Purushotham, Sanjay, Murali, Pavankumar, Liu, Yan

Neural Information Processing Systems

Neural networks are known to model statistical interactions, but they entangle the interactions at intermediate hidden layers for shared representation learning. We propose a framework, Neural Interaction Transparency (NIT), that disentangles the shared learning across different interactions to obtain their intrinsic lower-order and interpretable structure. This is done through a novel regularizer that directly penalizes interaction order. We show that disentangling interactions reduces a feedforward neural network to a generalized additive model with interactions, which can lead to transparent models that perform comparably to the state-of-the-art models. NIT is also flexible and efficient; it can learn generalized additive models with maximum $K$-order interactions by training only $O(1)$ models.